Abstract: Keyword search over a graph searches for a subgraph that contains a set of query keywords. A problem with most existing keyword search methods is that they may produce duplicate answers that contain the same set of content nodes (i.e., nodes containing a query keyword) although these nodes may be connected differently in different answers. Thus, users may be presented with many similar answers with trivial differences. In addition, some of the nodes in an answer may contain query keywords that are all covered by other nodes in the answer. Removing these nodes does not change the coverage of the answer but can make the answer more compact. The answers in which each content node contains at least one unique query keyword are called minimal answersin this paper. We define the problem of finding duplication-free and minimal answers, and propose algorithms for finding suchanswers efficiently. Extensive performance studies using two large real data sets confirm the efficiency and effectiveness of theproposed methods.

Keywords: Keyword search, graph data, polynomial delay, approximation algorithm.